COMP9414 Project 1¶

z5499630 Boyang, Peng¶

In this project, I will be working on time series prediction using neural network architectures, focusing on both classification and estimation tasks using a modified "Air Quality" dataset. The dataset contains hourly averaged responses from chemical sensors embedded in an air quality device, with recorded data from March 2004 to February 2005.

Activities¶

Classification Task¶

TODO:¶

Develop a neural network to predict if the concentration of Carbon Monoxide (CO) exceeds the mean of CO(GT) values. Perform binary classification to categorize instances as above or below the threshold. Handle missing values in the dataset.

Regression Task¶

TODO:¶

Develop a neural network to predict the concentration of Nitrogen Oxides (NOx) based on other air quality features. Estimate a continuous numerical value using regression techniques. Handle missing values in the dataset.

Data Preproccessing¶

Step 1: Loading Data from files¶

In [ ]:
# z5499630 Boyang, Peng

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
import seaborn as sns
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2

# Read data from the given xlsx file
AQ_file = 'AirQualityUCI _ Students.xlsx'
AQ_data = pd.read_excel(AQ_file)

Step 2: Identify variation range for input and output variables [Answering 2.1 (a)]¶

In [ ]:
# Calculate the minimum and maximum values of each column and create a DataFrame to display the results
min_max_df = pd.DataFrame({'Min': AQ_data.min(), 'Max': AQ_data.max()})

print(min_max_df)
# min_max_df
                               Min                  Max
Date           2004-03-10 00:00:00  2005-04-01 00:00:00
Time                      00:00:00             23:00:00
CO(GT)                      -200.0                 11.9
PT08.S1(CO)                 -200.0              2007.75
NMHC(GT)                      -200                 1189
C6H6(GT)                    -200.0            63.741476
PT08.S2(NMHC)               -200.0               2214.0
NOx(GT)                     -200.0               1479.0
PT08.S3(NOx)                -200.0              2682.75
NO2(GT)                     -200.0                339.7
PT08.S4(NO2)                -200.0               2775.0
PT08.S5(O3)                 -200.0              2522.75
T                           -200.0                 44.6
RH                          -200.0            87.174999
AH                          -200.0             2.231036

Step 3: Convert and Set Date Index¶

In [ ]:
# Convert Date column to datetime
AQ_data['Date'] = pd.to_datetime(AQ_data['Date'])

# Set Date as the index
AQ_data.set_index('Date', inplace=True)

# Drop the Time column
AQ_data.drop(columns=['Time'], inplace=True)

Step 4: Ploting each variable to observe the overall behaviour of the process [Answering 2.1 (b)]¶

In [ ]:
# Calculate the number of rows needed to fit all subplots in 2 columns for compact view
nrows = (len(AQ_data.columns) + 1) // 2

fig, axes = plt.subplots(nrows=nrows, ncols=2, figsize=(20, 20))
axes = axes.flatten()

for i, column in enumerate(AQ_data.columns):
    axes[i].plot(AQ_data.index, AQ_data[column], color='purple')
    axes[i].set_title(column)

for j in range(i+1, len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout()
plt.show()

Step 5: Handle Missing Values [Answering 2.1 (c)]¶

Since stated in the Problem context, the missing value has been tagget with -200. Hence, replace -200 with NaN for missing data in the plots above¶

Count the occurrences of -200 in each column¶

In [ ]:
# Checking missing value
missing_value_tally = (AQ_data == -200).sum()
print("Tally of missing value (-200):")
print(missing_value_tally)

# Replace the missing value with NaN
data_replaced = AQ_data.replace(-200, np.nan)

# Handle missing data using linear interpolation
data_interpolated = data_replaced.interpolate(method='linear', limit_direction='forward', axis=0)

# Plot data after interpolation
for column in data_interpolated.columns:
    plt.figure(figsize=(26, 6))
    data_interpolated[column].plot(title=f'{column} (After Interpolation)', color='purple')
    plt.xlabel('Date')
    plt.ylabel(column)
    plt.show()
Tally of missing value (-200):
CO(GT)           1585
PT08.S1(CO)       366
NMHC(GT)         7525
C6H6(GT)          366
PT08.S2(NMHC)     366
NOx(GT)          1573
PT08.S3(NOx)      366
NO2(GT)          1576
PT08.S4(NO2)      366
PT08.S5(O3)       366
T                 366
RH                366
AH                366
dtype: int64

Step 6: Detect and Handle Outliers [Answering 2.1 (c) cont.]¶

In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. It is defined as the difference between the 75th and 25th percentiles of the data. To calculate the IQR, the data set is divided into quartiles, or four rank-ordered even parts via linear interpolation. These quartiles are denoted by Q1 (also called the lower quartile), Q2 (the median), and Q3 (also called the upper quartile). The lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, so IQR = Q3 − Q1. The IQR is an example of a trimmed estimator, defined as the 25% trimmed range, which enhances the accuracy of dataset statistics by dropping lower contribution, outlying points.[1]

In [ ]:
# Detect outliers using IQR
Q1 = data_interpolated.quantile(0.25)
Q3 = data_interpolated.quantile(0.75)
IQR = Q3 - Q1

# Define outliers as points outside 1.5*IQR range
outliers_lower_bound = Q1 - 1.5 * IQR
outliers_upper_bound = Q3 + 1.5 * IQR

# Logging all the outliers
outliers = (data_interpolated < outliers_lower_bound) | (data_interpolated > outliers_upper_bound)

# Plot data with outliers marked
for column in data_interpolated.columns:
    plt.figure(figsize=(20, 4))
    plt.plot(data_interpolated[column], label=column, color='purple')
    plt.plot(data_interpolated[column][outliers[column]], 'r*', label='Outliers')
    plt.title(f'{column} (With Outliers Marked)')
    plt.xlabel('Date')
    plt.ylabel(column)
    plt.legend()
    plt.show()
    
# Replace outliers with NaN
data_interpolated[outliers] = np.nan

# Fill NaN values resulted from outliers detection using linear interpolation
data_cleaned = data_interpolated.interpolate(method='linear', limit_direction='forward', axis=0)

# data_cleaned = data_interpolated

Classification task¶

Step 1: Create target variable¶

In [ ]:
# Calculate the mean value for CO(GT), excluding missing values
co_mean = data_cleaned['CO(GT)'].mean()

# Create the binary target variable CO_Target in the data_cleaned DataFrame
# It assigns a value of 1 to the CO_Target column
# if the corresponding value in the CO(GT) column is greater than the calculated mean (co_mean), and 0 otherwise
data_cleaned['CO_Target'] = (data_cleaned['CO(GT)'] > co_mean).astype(int)

Step 2: Compute and Plot Correlation Matrix for Feature selection¶

Since NMHC(GT) has too many missing values the acutal valid values are now outliers so I dropped this feature in the future process¶

In [ ]:
# Compute and plot the correlation matrix
plt.figure(figsize=(8, 8))
corr_matrix = data_cleaned.drop(columns=['NMHC(GT)', 'CO_Target']).corr()
sns.heatmap(corr_matrix, annot=True, cmap='Greens')
plt.title('Feature Correlation Matrix')
plt.show()

Step 3: Prepare Features and Target¶

Feture engineering:

Rolling statistics here are a way to calculate statistics over a moving window of a fixed size across a time series data. This technique is particularly useful for capturing temporal trends and patterns over time.

Here, it calculates two new rolling statistics with a window size of 24.

Rolling Mean: it calculates the mean of the current and previous 23 values for each entry in the column. The result is a new column named f'{col}_rolling_mean'.

Rolling Standard Deviation: it calculates the standard deviation of the current and previous 23 values for each entry in the column. The result is a new column named f'{col}_rolling_std'.

The dropped features are:

'CO(GT)' and 'CO_Target': These are the target variable and its binary representation, which should not be part of the features.

'NMHC(GT)': Dropped due to massive missing values.

'T', 'RH', 'AH': These features were dropped based on their correlation with the target variable and other features.

In [ ]:
# Feature Engineering
# Rolling statistics
for col in data_cleaned.columns:
    if col not in ['CO(GT)', 'CO_Target', 'NMHC(GT)', 'T', 'RH', 'AH']:
        data_cleaned[f'{col}_rolling_mean'] = data_cleaned[col].rolling(window=24).mean()
        data_cleaned[f'{col}_rolling_std'] = data_cleaned[col].rolling(window=24).std()

# Drop rows with NaN values created by rolling statistics
data_cleaned.dropna(inplace=True)

# Prepare the features (X) and target (y)
# X = data_cleaned.drop(columns=['CO(GT)', 'CO_Target', 'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH'])
X = data_cleaned.drop(columns=['CO(GT)', 'CO_Target', 'NMHC(GT)', 'T', 'RH', 'AH'])

y = data_cleaned['CO_Target']

Step 4: Split and Standardize Data [Answering 2.1 (d)]¶

Train size: 70%, Validation size: 15%, Test size: 15%.¶

In [ ]:
# Split the data into training and combined validation/test sets
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3)

# Further split the combined set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

print(f"Mean value for CO(GT): {co_mean}")
print(f"Total data size: {X.shape}")
print(f"Training data size: {X_train.shape}")
print(f"Validation data size: {X_val.shape}")
print(f"Testing data size: {X_test.shape}")
Mean value for CO(GT): 2.090075376884422
Total data size: (7308, 24)
Training data size: (5115, 24)
Validation data size: (1096, 24)
Testing data size: (1097, 24)

Step 5: Build and Train the Neural Network [Answering 2.2, 2.3, 2.4(a)]¶

Total Number of Layers: 5 (excluding the input layer)¶

Dense Layer 1: 64 units, ReLU activation

Dropout Layer 1: 0.3 dropout rate

Dense Layer 2: 16 units, ReLU activation

Dropout Layer 2: 0.2 dropout rate

Output Layer: 1 unit, Sigmoid activation

Dropout Rate:¶

Higher Dropout Rate: More neurons dropped, stronger regularization, higher risk of underfitting.

Lower Dropout Rate: Fewer neurons dropped, weaker regularization, higher risk of overfitting.

L2 Regularization¶

0.001 By adding a penalty for large weights, L2 regularization helps to prevent the model from fitting the training data too closely, which can lead to better generalization on unseen data.

Training Parameters¶

Loss Function: binary_crossentropy This loss function is used for binary classification problems, where the goal is to predict one of two possible outcomes.

Optimizer:¶

Adam: Adam (Adaptive Moment Estimation) optimizer is used for both tasks.

Combines the advantages of two other extensions of stochastic gradient descent. Specifically, it uses adaptive learning rates and momentum.

Learning Rate:¶

lr = 0.0012 Controls the step size during the optimization process. A smaller learning rate can lead to more precise convergence but may require more epochs to train.

Batch Size:¶

64 The number of training samples used in one forward and backward pass. A smaller batch size requires less memory and provides more updates to the model weights, while a larger batch size provides a more accurate estimate of the gradient but requires more memory.

Epochs:¶

150 The number of times the entire training dataset is passed forward and backward through the neural network. More epochs can lead to better training but also increase the risk of overfitting.

Check Possible Overfitting¶

Verify the Training and Validation Accuracy, If the training accuracy continues to increase while the validation accuracy starts to plateau or decrease, it indicates overfitting.

In [ ]:
# Build the neural network
classification_model = Sequential([
    Input(shape=(X_train_scaled.shape[1],)),
    Dense(64, activation='relu', kernel_regularizer=l2(0.001)),
    Dropout(0.3),
    Dense(16, activation='relu', kernel_regularizer=l2(0.001)),
    Dropout(0.2),
    Dense(1, activation='sigmoid')
])

# Compile the model
classification_model.compile(optimizer=Adam(learning_rate=0.0012), loss='binary_crossentropy', metrics=['accuracy'])

# Show the maximal number of parameters
classification_model.summary()

# Train the neural network
history = classification_model.fit(X_train_scaled, y_train, epochs=150, batch_size=64, validation_data=(X_val_scaled, y_val))

# Plot combined training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

# Plot accuracy
ax1.plot(history.history['accuracy'], label='Training Accuracy')
ax1.plot(history.history['val_accuracy'], label='Validation Accuracy')
ax1.set_title('Model Accuracy')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Accuracy')
ax1.legend()

# Plot loss
ax2.plot(history.history['loss'], label='Training Loss')
ax2.plot(history.history['val_loss'], label='Validation Loss')
ax2.set_title('Model Loss')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Loss')
ax2.legend()

plt.tight_layout()
plt.show()

# Save the classification model
classification_model.save('classification_model.keras')
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 64)             │         1,600 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 16)             │         1,040 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 16)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │            17 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 2,657 (10.38 KB)
 Trainable params: 2,657 (10.38 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7063 - loss: 0.5959 - val_accuracy: 0.8814 - val_loss: 0.3711
Epoch 2/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 555us/step - accuracy: 0.8878 - loss: 0.3778 - val_accuracy: 0.8823 - val_loss: 0.3376
Epoch 3/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 555us/step - accuracy: 0.8915 - loss: 0.3356 - val_accuracy: 0.8942 - val_loss: 0.3117
Epoch 4/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8960 - loss: 0.3173 - val_accuracy: 0.9042 - val_loss: 0.2966
Epoch 5/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.9026 - loss: 0.3043 - val_accuracy: 0.9042 - val_loss: 0.2880
Epoch 6/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 569us/step - accuracy: 0.8968 - loss: 0.3144 - val_accuracy: 0.9106 - val_loss: 0.2803
Epoch 7/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 624us/step - accuracy: 0.9049 - loss: 0.3012 - val_accuracy: 0.9088 - val_loss: 0.2759
Epoch 8/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 626us/step - accuracy: 0.9002 - loss: 0.2912 - val_accuracy: 0.9106 - val_loss: 0.2695
Epoch 9/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 854us/step - accuracy: 0.9086 - loss: 0.2735 - val_accuracy: 0.9097 - val_loss: 0.2685
Epoch 10/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 576us/step - accuracy: 0.8983 - loss: 0.3041 - val_accuracy: 0.9088 - val_loss: 0.2663
Epoch 11/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 561us/step - accuracy: 0.9029 - loss: 0.2805 - val_accuracy: 0.9179 - val_loss: 0.2609
Epoch 12/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.9013 - loss: 0.2798 - val_accuracy: 0.9161 - val_loss: 0.2584
Epoch 13/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 553us/step - accuracy: 0.9070 - loss: 0.2711 - val_accuracy: 0.9161 - val_loss: 0.2572
Epoch 14/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 568us/step - accuracy: 0.9118 - loss: 0.2716 - val_accuracy: 0.9161 - val_loss: 0.2540
Epoch 15/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 550us/step - accuracy: 0.9090 - loss: 0.2620 - val_accuracy: 0.9197 - val_loss: 0.2518
Epoch 16/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 565us/step - accuracy: 0.9101 - loss: 0.2632 - val_accuracy: 0.9151 - val_loss: 0.2491
Epoch 17/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 562us/step - accuracy: 0.9115 - loss: 0.2517 - val_accuracy: 0.9142 - val_loss: 0.2504
Epoch 18/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 569us/step - accuracy: 0.9123 - loss: 0.2491 - val_accuracy: 0.9124 - val_loss: 0.2483
Epoch 19/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 559us/step - accuracy: 0.9104 - loss: 0.2664 - val_accuracy: 0.9179 - val_loss: 0.2438
Epoch 20/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 556us/step - accuracy: 0.9081 - loss: 0.2627 - val_accuracy: 0.9170 - val_loss: 0.2424
Epoch 21/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 566us/step - accuracy: 0.9129 - loss: 0.2502 - val_accuracy: 0.9133 - val_loss: 0.2433
Epoch 22/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step - accuracy: 0.9047 - loss: 0.2617 - val_accuracy: 0.9170 - val_loss: 0.2408
Epoch 23/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.9122 - loss: 0.2593 - val_accuracy: 0.9151 - val_loss: 0.2382
Epoch 24/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 559us/step - accuracy: 0.9087 - loss: 0.2472 - val_accuracy: 0.9161 - val_loss: 0.2407
Epoch 25/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 563us/step - accuracy: 0.9114 - loss: 0.2444 - val_accuracy: 0.9206 - val_loss: 0.2367
Epoch 26/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 550us/step - accuracy: 0.9051 - loss: 0.2610 - val_accuracy: 0.9188 - val_loss: 0.2379
Epoch 27/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 562us/step - accuracy: 0.9110 - loss: 0.2463 - val_accuracy: 0.9161 - val_loss: 0.2341
Epoch 28/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.9072 - loss: 0.2480 - val_accuracy: 0.9243 - val_loss: 0.2356
Epoch 29/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 550us/step - accuracy: 0.9216 - loss: 0.2282 - val_accuracy: 0.9224 - val_loss: 0.2309
Epoch 30/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 791us/step - accuracy: 0.9167 - loss: 0.2413 - val_accuracy: 0.9170 - val_loss: 0.2313
Epoch 31/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 584us/step - accuracy: 0.9134 - loss: 0.2322 - val_accuracy: 0.9069 - val_loss: 0.2377
Epoch 32/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.9159 - loss: 0.2366 - val_accuracy: 0.9179 - val_loss: 0.2301
Epoch 33/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step - accuracy: 0.9187 - loss: 0.2319 - val_accuracy: 0.9188 - val_loss: 0.2337
Epoch 34/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 577us/step - accuracy: 0.9077 - loss: 0.2498 - val_accuracy: 0.9206 - val_loss: 0.2313
Epoch 35/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 565us/step - accuracy: 0.9201 - loss: 0.2337 - val_accuracy: 0.9106 - val_loss: 0.2342
Epoch 36/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.9127 - loss: 0.2439 - val_accuracy: 0.9188 - val_loss: 0.2248
Epoch 37/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9245 - loss: 0.2183 - val_accuracy: 0.9188 - val_loss: 0.2261
Epoch 38/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.9198 - loss: 0.2274 - val_accuracy: 0.9206 - val_loss: 0.2254
Epoch 39/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9174 - loss: 0.2278 - val_accuracy: 0.9234 - val_loss: 0.2212
Epoch 40/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 519us/step - accuracy: 0.9180 - loss: 0.2248 - val_accuracy: 0.9224 - val_loss: 0.2239
Epoch 41/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.9120 - loss: 0.2360 - val_accuracy: 0.9279 - val_loss: 0.2250
Epoch 42/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.9164 - loss: 0.2290 - val_accuracy: 0.9206 - val_loss: 0.2214
Epoch 43/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 597us/step - accuracy: 0.9152 - loss: 0.2318 - val_accuracy: 0.9161 - val_loss: 0.2231
Epoch 44/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 865us/step - accuracy: 0.9184 - loss: 0.2218 - val_accuracy: 0.9188 - val_loss: 0.2211
Epoch 45/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 612us/step - accuracy: 0.9191 - loss: 0.2289 - val_accuracy: 0.9206 - val_loss: 0.2181
Epoch 46/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 560us/step - accuracy: 0.9139 - loss: 0.2270 - val_accuracy: 0.9197 - val_loss: 0.2201
Epoch 47/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 588us/step - accuracy: 0.9252 - loss: 0.2129 - val_accuracy: 0.9188 - val_loss: 0.2237
Epoch 48/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 613us/step - accuracy: 0.9186 - loss: 0.2260 - val_accuracy: 0.9188 - val_loss: 0.2147
Epoch 49/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 588us/step - accuracy: 0.9250 - loss: 0.2180 - val_accuracy: 0.9234 - val_loss: 0.2185
Epoch 50/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - accuracy: 0.9147 - loss: 0.2261 - val_accuracy: 0.9243 - val_loss: 0.2151
Epoch 51/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - accuracy: 0.9213 - loss: 0.2194 - val_accuracy: 0.9188 - val_loss: 0.2194
Epoch 52/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.9197 - loss: 0.2150 - val_accuracy: 0.9206 - val_loss: 0.2177
Epoch 53/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9227 - loss: 0.2101 - val_accuracy: 0.9124 - val_loss: 0.2303
Epoch 54/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.9166 - loss: 0.2186 - val_accuracy: 0.9142 - val_loss: 0.2192
Epoch 55/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 559us/step - accuracy: 0.9221 - loss: 0.2168 - val_accuracy: 0.9170 - val_loss: 0.2220
Epoch 56/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 569us/step - accuracy: 0.9256 - loss: 0.2220 - val_accuracy: 0.9206 - val_loss: 0.2169
Epoch 57/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 578us/step - accuracy: 0.9267 - loss: 0.2135 - val_accuracy: 0.9206 - val_loss: 0.2179
Epoch 58/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 577us/step - accuracy: 0.9272 - loss: 0.2088 - val_accuracy: 0.9161 - val_loss: 0.2207
Epoch 59/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 912us/step - accuracy: 0.9148 - loss: 0.2252 - val_accuracy: 0.9243 - val_loss: 0.2143
Epoch 60/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 621us/step - accuracy: 0.9238 - loss: 0.2144 - val_accuracy: 0.9252 - val_loss: 0.2182
Epoch 61/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 626us/step - accuracy: 0.9263 - loss: 0.2157 - val_accuracy: 0.9151 - val_loss: 0.2162
Epoch 62/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 576us/step - accuracy: 0.9225 - loss: 0.2102 - val_accuracy: 0.9243 - val_loss: 0.2134
Epoch 63/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 580us/step - accuracy: 0.9275 - loss: 0.2119 - val_accuracy: 0.9279 - val_loss: 0.2074
Epoch 64/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 622us/step - accuracy: 0.9207 - loss: 0.2083 - val_accuracy: 0.9270 - val_loss: 0.2094
Epoch 65/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 596us/step - accuracy: 0.9142 - loss: 0.2214 - val_accuracy: 0.9243 - val_loss: 0.2091
Epoch 66/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 610us/step - accuracy: 0.9315 - loss: 0.2028 - val_accuracy: 0.9261 - val_loss: 0.2133
Epoch 67/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 578us/step - accuracy: 0.9257 - loss: 0.2149 - val_accuracy: 0.9297 - val_loss: 0.2069
Epoch 68/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.9250 - loss: 0.2164 - val_accuracy: 0.9288 - val_loss: 0.2046
Epoch 69/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.9250 - loss: 0.2114 - val_accuracy: 0.9316 - val_loss: 0.2038
Epoch 70/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.9244 - loss: 0.2092 - val_accuracy: 0.9161 - val_loss: 0.2117
Epoch 71/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.9202 - loss: 0.2127 - val_accuracy: 0.9243 - val_loss: 0.2086
Epoch 72/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 880us/step - accuracy: 0.9162 - loss: 0.2145 - val_accuracy: 0.9197 - val_loss: 0.2088
Epoch 73/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 555us/step - accuracy: 0.9270 - loss: 0.2061 - val_accuracy: 0.9307 - val_loss: 0.2054
Epoch 74/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 629us/step - accuracy: 0.9244 - loss: 0.2107 - val_accuracy: 0.9243 - val_loss: 0.2096
Epoch 75/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 633us/step - accuracy: 0.9225 - loss: 0.2117 - val_accuracy: 0.9297 - val_loss: 0.2057
Epoch 76/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 628us/step - accuracy: 0.9269 - loss: 0.2063 - val_accuracy: 0.9270 - val_loss: 0.2076
Epoch 77/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 608us/step - accuracy: 0.9193 - loss: 0.2265 - val_accuracy: 0.9252 - val_loss: 0.2084
Epoch 78/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 637us/step - accuracy: 0.9229 - loss: 0.2293 - val_accuracy: 0.9252 - val_loss: 0.2117
Epoch 79/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 606us/step - accuracy: 0.9229 - loss: 0.2112 - val_accuracy: 0.9252 - val_loss: 0.2031
Epoch 80/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 603us/step - accuracy: 0.9342 - loss: 0.1959 - val_accuracy: 0.9261 - val_loss: 0.2044
Epoch 81/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 605us/step - accuracy: 0.9225 - loss: 0.2097 - val_accuracy: 0.9234 - val_loss: 0.2123
Epoch 82/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 619us/step - accuracy: 0.9261 - loss: 0.1990 - val_accuracy: 0.9234 - val_loss: 0.2104
Epoch 83/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 606us/step - accuracy: 0.9288 - loss: 0.2095 - val_accuracy: 0.9279 - val_loss: 0.2022
Epoch 84/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 572us/step - accuracy: 0.9280 - loss: 0.2017 - val_accuracy: 0.9307 - val_loss: 0.2032
Epoch 85/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 845us/step - accuracy: 0.9250 - loss: 0.2074 - val_accuracy: 0.9307 - val_loss: 0.2064
Epoch 86/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 563us/step - accuracy: 0.9227 - loss: 0.2138 - val_accuracy: 0.9215 - val_loss: 0.2073
Epoch 87/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 553us/step - accuracy: 0.9327 - loss: 0.1896 - val_accuracy: 0.9316 - val_loss: 0.2028
Epoch 88/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.9325 - loss: 0.1935 - val_accuracy: 0.9279 - val_loss: 0.2050
Epoch 89/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 553us/step - accuracy: 0.9301 - loss: 0.2014 - val_accuracy: 0.9288 - val_loss: 0.2047
Epoch 90/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 572us/step - accuracy: 0.9348 - loss: 0.1859 - val_accuracy: 0.9297 - val_loss: 0.1985
Epoch 91/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 571us/step - accuracy: 0.9284 - loss: 0.2059 - val_accuracy: 0.9297 - val_loss: 0.1985
Epoch 92/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 554us/step - accuracy: 0.9342 - loss: 0.1925 - val_accuracy: 0.9389 - val_loss: 0.1966
Epoch 93/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 548us/step - accuracy: 0.9336 - loss: 0.1893 - val_accuracy: 0.9325 - val_loss: 0.2006
Epoch 94/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.9362 - loss: 0.1856 - val_accuracy: 0.9279 - val_loss: 0.1996
Epoch 95/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - accuracy: 0.9281 - loss: 0.2049 - val_accuracy: 0.9398 - val_loss: 0.1933
Epoch 96/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 564us/step - accuracy: 0.9303 - loss: 0.1968 - val_accuracy: 0.9261 - val_loss: 0.2055
Epoch 97/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.9295 - loss: 0.2032 - val_accuracy: 0.9343 - val_loss: 0.1998
Epoch 98/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 506us/step - accuracy: 0.9338 - loss: 0.1912 - val_accuracy: 0.9334 - val_loss: 0.2017
Epoch 99/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.9251 - loss: 0.2100 - val_accuracy: 0.9288 - val_loss: 0.2063
Epoch 100/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 502us/step - accuracy: 0.9317 - loss: 0.2004 - val_accuracy: 0.9325 - val_loss: 0.1977
Epoch 101/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 822us/step - accuracy: 0.9333 - loss: 0.1906 - val_accuracy: 0.9334 - val_loss: 0.1976
Epoch 102/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 550us/step - accuracy: 0.9305 - loss: 0.2007 - val_accuracy: 0.9325 - val_loss: 0.1975
Epoch 103/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 588us/step - accuracy: 0.9354 - loss: 0.1943 - val_accuracy: 0.9370 - val_loss: 0.1934
Epoch 104/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 610us/step - accuracy: 0.9341 - loss: 0.1911 - val_accuracy: 0.9361 - val_loss: 0.1908
Epoch 105/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 605us/step - accuracy: 0.9279 - loss: 0.1898 - val_accuracy: 0.9334 - val_loss: 0.1935
Epoch 106/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 548us/step - accuracy: 0.9280 - loss: 0.2021 - val_accuracy: 0.9370 - val_loss: 0.1941
Epoch 107/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 575us/step - accuracy: 0.9385 - loss: 0.1872 - val_accuracy: 0.9380 - val_loss: 0.1919
Epoch 108/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 593us/step - accuracy: 0.9330 - loss: 0.1865 - val_accuracy: 0.9370 - val_loss: 0.1986
Epoch 109/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 577us/step - accuracy: 0.9381 - loss: 0.1903 - val_accuracy: 0.9316 - val_loss: 0.1966
Epoch 110/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 548us/step - accuracy: 0.9327 - loss: 0.1924 - val_accuracy: 0.9343 - val_loss: 0.1956
Epoch 111/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.9348 - loss: 0.1873 - val_accuracy: 0.9398 - val_loss: 0.1868
Epoch 112/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 519us/step - accuracy: 0.9309 - loss: 0.1941 - val_accuracy: 0.9370 - val_loss: 0.1908
Epoch 113/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9334 - loss: 0.1972 - val_accuracy: 0.9370 - val_loss: 0.1916
Epoch 114/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 763us/step - accuracy: 0.9323 - loss: 0.1961 - val_accuracy: 0.9352 - val_loss: 0.1957
Epoch 115/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 507us/step - accuracy: 0.9354 - loss: 0.1916 - val_accuracy: 0.9407 - val_loss: 0.1858
Epoch 116/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 511us/step - accuracy: 0.9379 - loss: 0.1826 - val_accuracy: 0.9398 - val_loss: 0.1932
Epoch 117/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 509us/step - accuracy: 0.9315 - loss: 0.1956 - val_accuracy: 0.9352 - val_loss: 0.1911
Epoch 118/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - accuracy: 0.9375 - loss: 0.1807 - val_accuracy: 0.9307 - val_loss: 0.1932
Epoch 119/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 514us/step - accuracy: 0.9344 - loss: 0.1871 - val_accuracy: 0.9334 - val_loss: 0.1915
Epoch 120/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.9393 - loss: 0.1847 - val_accuracy: 0.9389 - val_loss: 0.1925
Epoch 121/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 510us/step - accuracy: 0.9348 - loss: 0.1891 - val_accuracy: 0.9416 - val_loss: 0.1882
Epoch 122/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.9386 - loss: 0.1763 - val_accuracy: 0.9352 - val_loss: 0.1925
Epoch 123/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - accuracy: 0.9308 - loss: 0.1908 - val_accuracy: 0.9370 - val_loss: 0.1924
Epoch 124/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 522us/step - accuracy: 0.9287 - loss: 0.1927 - val_accuracy: 0.9380 - val_loss: 0.1941
Epoch 125/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - accuracy: 0.9375 - loss: 0.1855 - val_accuracy: 0.9398 - val_loss: 0.1925
Epoch 126/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9426 - loss: 0.1866 - val_accuracy: 0.9361 - val_loss: 0.1965
Epoch 127/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9366 - loss: 0.1808 - val_accuracy: 0.9425 - val_loss: 0.1861
Epoch 128/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9340 - loss: 0.1922 - val_accuracy: 0.9352 - val_loss: 0.1925
Epoch 129/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.9344 - loss: 0.1876 - val_accuracy: 0.9407 - val_loss: 0.1865
Epoch 130/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 828us/step - accuracy: 0.9352 - loss: 0.1846 - val_accuracy: 0.9370 - val_loss: 0.1921
Epoch 131/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.9364 - loss: 0.1888 - val_accuracy: 0.9425 - val_loss: 0.1877
Epoch 132/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 574us/step - accuracy: 0.9432 - loss: 0.1701 - val_accuracy: 0.9343 - val_loss: 0.1889
Epoch 133/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 603us/step - accuracy: 0.9390 - loss: 0.1852 - val_accuracy: 0.9343 - val_loss: 0.1946
Epoch 134/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 608us/step - accuracy: 0.9372 - loss: 0.1897 - val_accuracy: 0.9407 - val_loss: 0.1864
Epoch 135/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 557us/step - accuracy: 0.9461 - loss: 0.1656 - val_accuracy: 0.9380 - val_loss: 0.1866
Epoch 136/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 612us/step - accuracy: 0.9374 - loss: 0.1783 - val_accuracy: 0.9370 - val_loss: 0.1888
Epoch 137/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 603us/step - accuracy: 0.9376 - loss: 0.1740 - val_accuracy: 0.9407 - val_loss: 0.1914
Epoch 138/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 586us/step - accuracy: 0.9361 - loss: 0.1863 - val_accuracy: 0.9370 - val_loss: 0.1859
Epoch 139/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 576us/step - accuracy: 0.9370 - loss: 0.1883 - val_accuracy: 0.9398 - val_loss: 0.1856
Epoch 140/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 586us/step - accuracy: 0.9401 - loss: 0.1795 - val_accuracy: 0.9407 - val_loss: 0.1877
Epoch 141/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 785us/step - accuracy: 0.9380 - loss: 0.1809 - val_accuracy: 0.9380 - val_loss: 0.1864
Epoch 142/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 585us/step - accuracy: 0.9385 - loss: 0.1847 - val_accuracy: 0.9389 - val_loss: 0.1922
Epoch 143/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 636us/step - accuracy: 0.9366 - loss: 0.1821 - val_accuracy: 0.9434 - val_loss: 0.1885
Epoch 144/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 645us/step - accuracy: 0.9438 - loss: 0.1677 - val_accuracy: 0.9398 - val_loss: 0.1890
Epoch 145/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 621us/step - accuracy: 0.9409 - loss: 0.1817 - val_accuracy: 0.9370 - val_loss: 0.1968
Epoch 146/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 601us/step - accuracy: 0.9352 - loss: 0.1872 - val_accuracy: 0.9334 - val_loss: 0.1924
Epoch 147/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 563us/step - accuracy: 0.9380 - loss: 0.1814 - val_accuracy: 0.9425 - val_loss: 0.1847
Epoch 148/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 590us/step - accuracy: 0.9387 - loss: 0.1809 - val_accuracy: 0.9398 - val_loss: 0.1873
Epoch 149/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 658us/step - accuracy: 0.9425 - loss: 0.1751 - val_accuracy: 0.9361 - val_loss: 0.1920
Epoch 150/150
80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 978us/step - accuracy: 0.9433 - loss: 0.1739 - val_accuracy: 0.9370 - val_loss: 0.1882

Step 6: Evaluate the Model [Answering 2.4(b)]¶

In [ ]:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score

# Evaluate and Predict the model on the test set
# Threshold has been set to 0.5 since this is a binary classification
test_loss, test_accuracy = classification_model.evaluate(X_test_scaled, y_test)
y_test_pred = (classification_model.predict(X_test_scaled) > 0.5).astype(int)

# Generate the confusion matrix
conf_matrix = confusion_matrix(y_test, y_test_pred)

# Extract TP, FP, TN, FN from the confusion matrix
tn, fp, fn, tp = conf_matrix.ravel()

# Calculate accuracy
accuracy = accuracy_score(y_test, y_test_pred)

# Calculate precision
precision = precision_score(y_test, y_test_pred)

# Number of samples
n_samples = len(y_test)

# Display the results in a table
classification_results = pd.DataFrame(
    {
        "Metric": [
            "True Positives (TP)",
            "False Positives (FP)",
            "True Negatives (TN)",
            "False Negatives (FN)",
            "Accuracy",
            "Precision",
        ],
        "Value": [tp, fp, tn, fn, f"{accuracy * 100:.4f}%", f"{precision * 100:.4f}%"],
    }
)

print(classification_results)

fig, ax = plt.subplots(figsize=(4, 1))

ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.set_frame_on(False)

table = ax.table(
    cellText=[[f"{accuracy * 100:.4f}%", f"{precision * 100:.4f}%", n_samples]],
    colLabels=["Accuracy", "Precision", "#Samples"],
    cellLoc="center",
    loc="center",
)

plt.title(
    "Accuracy and precision for the test data for the classification task", fontsize=12
)
plt.show()

# Plot the confusion matrix
plt.figure(figsize=(5, 4))
sns.heatmap(conf_matrix, annot=True, fmt="d")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()
35/35 ━━━━━━━━━━━━━━━━━━━━ 0s 330us/step - accuracy: 0.9390 - loss: 0.1909
35/35 ━━━━━━━━━━━━━━━━━━━━ 0s 598us/step
                 Metric     Value
0   True Positives (TP)       453
1  False Positives (FP)        40
2   True Negatives (TN)       573
3  False Negatives (FN)        31
4              Accuracy  93.5278%
5             Precision  91.8864%
In [ ]:
# Code Dilimiter
pass

Regression Task¶

Step 1: Reading and Preproccess the data like above¶

In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import tensorflow as tf
import seaborn as sns
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2

# Read data from the given xlsx file
AQ_file = 'AirQualityUCI _ Students.xlsx'
AQ_data = pd.read_excel(AQ_file)

# Calculate the minimum and maximum values of each column and create a DataFrame to display the results
min_max_df = pd.DataFrame({'Min': AQ_data.min(), 'Max': AQ_data.max()})

print(min_max_df)
# min_max_df

# Convert Date column to datetime
AQ_data['Date'] = pd.to_datetime(AQ_data['Date'])

# Set Date as the index
AQ_data.set_index('Date', inplace=True)

# Drop the Time column
AQ_data.drop(columns=['Time'], inplace=True)

# Checking missing value
# Since stated in the Problem context, the missing value has been tagget with -200
# Hence, replace -200 with NaN for missing data
# Count the occurrences of -200 in each column
missing_value_tally = (AQ_data == -200).sum()
print("Tally of missing value (-200):")
print(missing_value_tally)
data_replaced = AQ_data.replace(-200, np.nan)

# Handle missing data using linear interpolation
data_interpolated = data_replaced.interpolate(method='linear', limit_direction='forward', axis=0)

# Detect outliers using IQR
Q1 = data_interpolated.quantile(0.25)
Q3 = data_interpolated.quantile(0.75)
IQR = Q3 - Q1

# Define outliers as points outside 1.5*IQR range
outliers_lower_bound = Q1 - 1.5 * IQR
outliers_upper_bound = Q3 + 1.5 * IQR

# Logging all the outliers
outliers = (data_interpolated < outliers_lower_bound) | (data_interpolated > outliers_upper_bound)

# Plot data with outliers marked
for column in data_interpolated.columns:
    plt.figure(figsize=(20, 4))
    plt.plot(data_interpolated[column], label=column, color='purple')
    plt.plot(data_interpolated[column][outliers[column]], 'r*', label='Outliers')
    plt.title(f'{column} (With Outliers Marked)')
    plt.xlabel('Date')
    plt.ylabel(column)
    plt.legend()
    plt.show()

# data_cleaned = data_interpolated
# Replace outliers with NaN
data_interpolated[outliers] = np.nan

# Fill NaN values resulted from outliers detection using linear interpolation
data_cleaned = data_interpolated.interpolate(method='linear', limit_direction='forward', axis=0)

# Calculate the mean value for CO(GT), excluding missing values
co_mean = data_cleaned['CO(GT)'].mean()

# Since NMHC(GT) has too many missing values
# the acutal valid values are now outliers
# so we dropped this feature in the future process
# Compute and plot the correlation matrix
plt.figure(figsize=(8, 8))
corr_matrix = data_cleaned.drop(columns='NMHC(GT)').corr()
sns.heatmap(corr_matrix, annot=True, cmap='Greens')
plt.title('Feature Correlation Matrix')
plt.show()
                               Min                  Max
Date           2004-03-10 00:00:00  2005-04-01 00:00:00
Time                      00:00:00             23:00:00
CO(GT)                      -200.0                 11.9
PT08.S1(CO)                 -200.0              2007.75
NMHC(GT)                      -200                 1189
C6H6(GT)                    -200.0            63.741476
PT08.S2(NMHC)               -200.0               2214.0
NOx(GT)                     -200.0               1479.0
PT08.S3(NOx)                -200.0              2682.75
NO2(GT)                     -200.0                339.7
PT08.S4(NO2)                -200.0               2775.0
PT08.S5(O3)                 -200.0              2522.75
T                           -200.0                 44.6
RH                          -200.0            87.174999
AH                          -200.0             2.231036
Tally of missing value (-200):
CO(GT)           1585
PT08.S1(CO)       366
NMHC(GT)         7525
C6H6(GT)          366
PT08.S2(NMHC)     366
NOx(GT)          1573
PT08.S3(NOx)      366
NO2(GT)          1576
PT08.S4(NO2)      366
PT08.S5(O3)       366
T                 366
RH                366
AH                366
dtype: int64

Step 2 & 3: Split Data, Prepare Features, Target and Standardize the Data¶

Train size: 70%, Validation size: 15%, Test size: 15%.¶

The dropped features are:

'NOx(GT)': The target variable.

'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH': These features were dropped based on their correlation.

In [ ]:
# Split the data
n = len(data_cleaned)
train_size = int(n * 0.7)
val_size = int(n * 0.15)

train_data = data_cleaned[:train_size]
val_data = data_cleaned[train_size:train_size + val_size]
test_data = data_cleaned[train_size + val_size:]

# Drop features and create target
X_train = train_data.drop(columns=['NOx(GT)', 'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH'])
y_train = train_data['NOx(GT)']
X_val = val_data.drop(columns=['NOx(GT)', 'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH'])
y_val = val_data['NOx(GT)']
X_test = test_data.drop(columns=['NOx(GT)', 'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH'])
y_test = test_data['NOx(GT)']

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

Step 4: Build and Train the Neural Network¶

Total Number of Layers: 5 (excluding the input layer)¶

Dense Layer 1: 32 units, ReLU activation

Dropout Layer 1: 0.5 dropout rate

Dense Layer 2: 16 units, ReLU activation

Dropout Layer 2: 0.5 dropout rate

Output Layer: 1 unit, linear activation

Dropout Rate:¶

Higher Dropout Rate: More neurons dropped, stronger regularization, higher risk of underfitting.

Lower Dropout Rate: Fewer neurons dropped, weaker regularization, higher risk of overfitting.

L2 Regularization¶

0.01 By adding a penalty for large weights, L2 regularization helps to prevent the model from fitting the training data too closely, which can lead to better generalization on unseen data.

Training Parameters¶

Loss Function: mean_squared_error This loss function measures the average squared difference between the actual and predicted values, commonly used in regression problems.

Optimizer:¶

Adam: Adam (Adaptive Moment Estimation) optimizer is used for both tasks.

Combines the advantages of two other extensions of stochastic gradient descent. Specifically, it uses adaptive learning rates and momentum.

Learning Rate:¶

lr = 0.001 Controls the step size during the optimization process. A smaller learning rate can lead to more precise convergence but may require more epochs to train.

Batch Size:¶

64 The number of training samples used in one forward and backward pass. A smaller batch size requires less memory and provides more updates to the model weights, while a larger batch size provides a more accurate estimate of the gradient but requires more memory.

Epochs:¶

100 The number of times the entire training dataset is passed forward and backward through the neural network. More epochs can lead to better training but also increase the risk of overfitting.

In [ ]:
# Build the neural network
regression_model = Sequential([
    Input(shape=(X_train_scaled.shape[1],)),
    Dense(32, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(16, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(1, activation='linear')
])

# Compile the model
regression_model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error', metrics=['mae'])

# Print the model summary
regression_model.summary()

# Train the neural network
history = regression_model.fit(X_train_scaled, y_train, epochs=100, batch_size=64, validation_data=(X_val_scaled, y_val))

# Plot training history
plt.figure(figsize=(8, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Estimation Task - Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

# Save the model
regression_model.save('regression_model.keras')
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_3 (Dense)                 │ (None, 32)             │           256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_2 (Dropout)             │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_4 (Dense)                 │ (None, 16)             │           528 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_3 (Dropout)             │ (None, 16)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_5 (Dense)                 │ (None, 1)              │            17 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 801 (3.13 KB)
 Trainable params: 801 (3.13 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 61553.3867 - mae: 194.9862 - val_loss: 127764.2266 - val_mae: 314.1122
Epoch 2/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 560us/step - loss: 56309.1289 - mae: 184.6492 - val_loss: 109803.7109 - val_mae: 290.4947
Epoch 3/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 548us/step - loss: 46862.9961 - mae: 164.8429 - val_loss: 69975.9375 - val_mae: 231.1930
Epoch 4/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 563us/step - loss: 29353.3848 - mae: 125.5053 - val_loss: 37724.3789 - val_mae: 167.7085
Epoch 5/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 514us/step - loss: 22145.1211 - mae: 107.3039 - val_loss: 28362.0938 - val_mae: 143.8550
Epoch 6/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 484us/step - loss: 19503.3633 - mae: 100.4511 - val_loss: 22714.2012 - val_mae: 127.4551
Epoch 7/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 482us/step - loss: 18629.7969 - mae: 96.9599 - val_loss: 18079.9336 - val_mae: 111.9718
Epoch 8/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 477us/step - loss: 15648.2793 - mae: 88.6912 - val_loss: 13975.5244 - val_mae: 95.6824
Epoch 9/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 15373.6426 - mae: 87.2180 - val_loss: 13150.4639 - val_mae: 90.5648
Epoch 10/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 14087.3066 - mae: 83.8858 - val_loss: 11325.8213 - val_mae: 82.0093
Epoch 11/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 14815.3525 - mae: 84.6452 - val_loss: 10954.9736 - val_mae: 79.1072
Epoch 12/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 467us/step - loss: 14528.0449 - mae: 84.2774 - val_loss: 10813.6152 - val_mae: 77.8448
Epoch 13/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 680us/step - loss: 13989.2568 - mae: 82.5738 - val_loss: 10347.5635 - val_mae: 75.1891
Epoch 14/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 461us/step - loss: 14303.2402 - mae: 82.8606 - val_loss: 10207.5859 - val_mae: 74.0061
Epoch 15/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 490us/step - loss: 13901.7510 - mae: 81.8750 - val_loss: 10129.0186 - val_mae: 73.4147
Epoch 16/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - loss: 12838.4453 - mae: 79.3279 - val_loss: 9911.4990 - val_mae: 72.2322
Epoch 17/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 524us/step - loss: 14078.5820 - mae: 81.7651 - val_loss: 9531.6445 - val_mae: 71.0286
Epoch 18/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - loss: 12992.3936 - mae: 79.3168 - val_loss: 8792.8701 - val_mae: 67.5859
Epoch 19/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 666us/step - loss: 12996.5869 - mae: 79.9756 - val_loss: 9815.4414 - val_mae: 71.5927
Epoch 20/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - loss: 13475.7080 - mae: 80.0985 - val_loss: 9717.7061 - val_mae: 71.5147
Epoch 21/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 801us/step - loss: 13240.8662 - mae: 80.1650 - val_loss: 9857.5801 - val_mae: 71.6041
Epoch 22/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 555us/step - loss: 12675.4688 - mae: 77.8939 - val_loss: 9330.0000 - val_mae: 69.7612
Epoch 23/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - loss: 13203.4893 - mae: 80.2110 - val_loss: 9238.9893 - val_mae: 69.2973
Epoch 24/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - loss: 12804.1699 - mae: 78.8219 - val_loss: 9826.4824 - val_mae: 71.5120
Epoch 25/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - loss: 12784.1260 - mae: 78.8041 - val_loss: 9522.9668 - val_mae: 70.1453
Epoch 26/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - loss: 12536.2256 - mae: 78.0159 - val_loss: 10138.7188 - val_mae: 72.4841
Epoch 27/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 478us/step - loss: 13846.3311 - mae: 81.8266 - val_loss: 9967.9609 - val_mae: 71.7836
Epoch 28/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 502us/step - loss: 12799.3789 - mae: 78.8519 - val_loss: 9299.1025 - val_mae: 69.5760
Epoch 29/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 565us/step - loss: 12867.0176 - mae: 78.5506 - val_loss: 9519.6279 - val_mae: 70.2912
Epoch 30/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - loss: 12205.2646 - mae: 77.4041 - val_loss: 9608.2207 - val_mae: 70.8540
Epoch 31/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 539us/step - loss: 13099.6113 - mae: 79.1797 - val_loss: 10170.4990 - val_mae: 72.6850
Epoch 32/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - loss: 12527.9863 - mae: 77.7044 - val_loss: 9638.5850 - val_mae: 70.5168
Epoch 33/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 507us/step - loss: 13026.8936 - mae: 78.5673 - val_loss: 10066.0840 - val_mae: 72.2247
Epoch 34/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 484us/step - loss: 12670.7031 - mae: 78.3236 - val_loss: 9191.1729 - val_mae: 68.5695
Epoch 35/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 469us/step - loss: 12938.8750 - mae: 78.7875 - val_loss: 9594.1396 - val_mae: 70.0701
Epoch 36/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 476us/step - loss: 11926.0312 - mae: 76.0345 - val_loss: 9134.7383 - val_mae: 68.2888
Epoch 37/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 12761.3818 - mae: 77.7265 - val_loss: 9197.1846 - val_mae: 68.5467
Epoch 38/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 459us/step - loss: 12321.4688 - mae: 76.7919 - val_loss: 9436.5576 - val_mae: 68.9770
Epoch 39/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 13234.0264 - mae: 79.9129 - val_loss: 9799.7871 - val_mae: 70.3504
Epoch 40/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 12219.4492 - mae: 76.9154 - val_loss: 9278.0664 - val_mae: 68.6299
Epoch 41/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 502us/step - loss: 12139.0332 - mae: 76.7123 - val_loss: 9597.4004 - val_mae: 70.0064
Epoch 42/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 522us/step - loss: 12638.3135 - mae: 77.5986 - val_loss: 9810.5029 - val_mae: 70.9134
Epoch 43/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 533us/step - loss: 11764.7744 - mae: 75.1614 - val_loss: 10029.1748 - val_mae: 71.7790
Epoch 44/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - loss: 12304.1348 - mae: 76.0437 - val_loss: 9332.0098 - val_mae: 69.0334
Epoch 45/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 484us/step - loss: 12494.8711 - mae: 77.0571 - val_loss: 9866.5312 - val_mae: 70.6694
Epoch 46/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 12959.2891 - mae: 78.1587 - val_loss: 10299.5127 - val_mae: 72.3165
Epoch 47/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 457us/step - loss: 12753.2432 - mae: 78.0977 - val_loss: 9373.9756 - val_mae: 68.7528
Epoch 48/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 12382.4795 - mae: 77.2587 - val_loss: 10643.3428 - val_mae: 73.0531
Epoch 49/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 12321.1436 - mae: 76.4589 - val_loss: 9712.1797 - val_mae: 69.8050
Epoch 50/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 12285.7109 - mae: 77.1056 - val_loss: 10148.6084 - val_mae: 71.3279
Epoch 51/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 468us/step - loss: 12316.6621 - mae: 76.0121 - val_loss: 9230.6982 - val_mae: 67.9730
Epoch 52/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 507us/step - loss: 12293.9346 - mae: 76.4706 - val_loss: 9969.6631 - val_mae: 70.9373
Epoch 53/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 510us/step - loss: 12349.3447 - mae: 76.5777 - val_loss: 9518.0586 - val_mae: 69.0940
Epoch 54/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 482us/step - loss: 11632.0527 - mae: 75.5078 - val_loss: 8882.2598 - val_mae: 66.8397
Epoch 55/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 494us/step - loss: 12574.0801 - mae: 77.3774 - val_loss: 10018.7988 - val_mae: 70.6655
Epoch 56/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 474us/step - loss: 12063.2363 - mae: 75.0671 - val_loss: 9369.6074 - val_mae: 68.3626
Epoch 57/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 471us/step - loss: 12649.2158 - mae: 78.0780 - val_loss: 10100.3994 - val_mae: 71.1313
Epoch 58/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 495us/step - loss: 12976.7744 - mae: 77.7949 - val_loss: 9727.9082 - val_mae: 70.0122
Epoch 59/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 500us/step - loss: 11814.4697 - mae: 75.4951 - val_loss: 9337.5225 - val_mae: 68.6653
Epoch 60/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 504us/step - loss: 12544.5303 - mae: 76.5813 - val_loss: 9820.8877 - val_mae: 69.9761
Epoch 61/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 494us/step - loss: 12075.5342 - mae: 76.3760 - val_loss: 9682.5010 - val_mae: 69.3724
Epoch 62/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 494us/step - loss: 12299.7588 - mae: 75.6792 - val_loss: 9881.9834 - val_mae: 70.1546
Epoch 63/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - loss: 12263.9561 - mae: 75.6416 - val_loss: 9501.0566 - val_mae: 68.9036
Epoch 64/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 11976.6904 - mae: 75.1093 - val_loss: 8847.6484 - val_mae: 66.2224
Epoch 65/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 12309.9092 - mae: 75.5744 - val_loss: 9333.6875 - val_mae: 68.0616
Epoch 66/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 12519.4707 - mae: 75.8448 - val_loss: 9826.2295 - val_mae: 69.6995
Epoch 67/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 12433.1846 - mae: 75.7059 - val_loss: 9607.3135 - val_mae: 69.1450
Epoch 68/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 13265.8340 - mae: 78.8109 - val_loss: 9492.5986 - val_mae: 68.7849
Epoch 69/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 466us/step - loss: 12460.4414 - mae: 76.3447 - val_loss: 8752.6709 - val_mae: 65.8741
Epoch 70/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 510us/step - loss: 12287.9883 - mae: 76.1733 - val_loss: 9401.2598 - val_mae: 68.2097
Epoch 71/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 502us/step - loss: 12419.7314 - mae: 75.8472 - val_loss: 9565.0693 - val_mae: 68.6724
Epoch 72/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 494us/step - loss: 12032.4414 - mae: 73.8913 - val_loss: 9838.4258 - val_mae: 69.7803
Epoch 73/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 504us/step - loss: 11856.9561 - mae: 74.8298 - val_loss: 9567.6572 - val_mae: 68.7666
Epoch 74/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 484us/step - loss: 12120.1768 - mae: 74.7816 - val_loss: 9154.5293 - val_mae: 67.2800
Epoch 75/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 461us/step - loss: 12075.4922 - mae: 75.5877 - val_loss: 8929.6562 - val_mae: 66.4332
Epoch 76/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 457us/step - loss: 12318.8320 - mae: 75.4315 - val_loss: 9364.3418 - val_mae: 68.0182
Epoch 77/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 458us/step - loss: 11805.5283 - mae: 74.5429 - val_loss: 9179.6592 - val_mae: 67.3328
Epoch 78/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 11737.1621 - mae: 73.4922 - val_loss: 8847.7197 - val_mae: 66.0088
Epoch 79/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 11996.5840 - mae: 75.3560 - val_loss: 10063.8457 - val_mae: 70.3492
Epoch 80/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 485us/step - loss: 12316.4824 - mae: 75.6154 - val_loss: 9203.7275 - val_mae: 67.1881
Epoch 81/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 510us/step - loss: 11894.1025 - mae: 74.4802 - val_loss: 9036.0244 - val_mae: 66.5145
Epoch 82/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - loss: 12576.9785 - mae: 76.5851 - val_loss: 9430.8184 - val_mae: 67.9979
Epoch 83/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 742us/step - loss: 11370.8438 - mae: 73.0973 - val_loss: 8984.6514 - val_mae: 66.3147
Epoch 84/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 506us/step - loss: 12180.0342 - mae: 75.1828 - val_loss: 9437.6885 - val_mae: 68.1385
Epoch 85/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 470us/step - loss: 12081.4775 - mae: 75.1788 - val_loss: 9333.6240 - val_mae: 67.6690
Epoch 86/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 467us/step - loss: 11579.7793 - mae: 74.5770 - val_loss: 9008.8701 - val_mae: 66.5992
Epoch 87/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 11822.9160 - mae: 73.8592 - val_loss: 9261.8086 - val_mae: 67.5264
Epoch 88/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 11138.4453 - mae: 72.8087 - val_loss: 8982.2295 - val_mae: 66.2923
Epoch 89/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 12033.8340 - mae: 74.0803 - val_loss: 8331.1182 - val_mae: 63.7325
Epoch 90/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 11630.2344 - mae: 73.3748 - val_loss: 9220.3457 - val_mae: 67.0371
Epoch 91/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 496us/step - loss: 12340.1455 - mae: 74.9579 - val_loss: 9401.8184 - val_mae: 67.6634
Epoch 92/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 497us/step - loss: 11881.9814 - mae: 74.3903 - val_loss: 9034.6484 - val_mae: 66.3853
Epoch 93/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 503us/step - loss: 11713.7549 - mae: 73.9831 - val_loss: 9518.6611 - val_mae: 68.0643
Epoch 94/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 503us/step - loss: 12265.2979 - mae: 75.0063 - val_loss: 8875.3232 - val_mae: 65.8139
Epoch 95/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 478us/step - loss: 11397.8691 - mae: 72.6730 - val_loss: 9087.3535 - val_mae: 66.5081
Epoch 96/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 11859.5020 - mae: 74.9775 - val_loss: 9282.6094 - val_mae: 67.1720
Epoch 97/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 11596.9639 - mae: 73.5725 - val_loss: 9373.8594 - val_mae: 67.6800
Epoch 98/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 11811.4082 - mae: 74.7630 - val_loss: 9583.4844 - val_mae: 68.3340
Epoch 99/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 456us/step - loss: 11906.4355 - mae: 75.0394 - val_loss: 8935.8076 - val_mae: 66.1234
Epoch 100/100
92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 12420.6582 - mae: 76.0137 - val_loss: 9181.8975 - val_mae: 67.0625

Step 5: Evaluate the Model [Answering 2.4(c)(d)]¶

Check Possible Overfitting¶

Verify the Training and Validation Loss, If the training loss continues to decrease while the validation loss starts to increase, it indicates overfitting.

In [ ]:
# Evaluate the regression model on the test set
test_loss, test_mae = regression_model.evaluate(X_test_scaled, y_test)
print(f"Test Mean Absolute Error: {test_mae:.4f}")

# Predict NOx concentrations on the validation set
y_val_pred = regression_model.predict(X_val_scaled)

# Plot true vs predicted NOx concentrations
plt.figure(figsize=(20, 6))
plt.plot(y_val.values, label='MLP Actual NOx(GT) Values (Validation)')
plt.plot(y_val_pred, label='MLP Estimated NOx(GT) Values (Validation)')
plt.title('MLP Validation Phase - Actual vs Estimated NOx(GT) Values')
plt.xlabel('Sample Index')
plt.ylabel('NOx(GT) Value')
plt.legend()
plt.show()

# Predict NOx concentrations on the test set
y_test_pred = regression_model.predict(X_test_scaled)

# Calculate RMSE and MAE of the test set
rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))

mae = mean_absolute_error(y_test, y_test_pred)

# Number of samples
n_samples = len(y_test)

# Create a DataFrame to display the performace indexes table
regression_results_index_table = pd.DataFrame({
    'RMSE': [f'{rmse:.4f}'],
    'MAE': [f'{mae:.4f}'],
    '#Samples': [n_samples]
})

print(regression_results_index_table)

fig, ax = plt.subplots(figsize=(4, 1))

ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.set_frame_on(False)

regression_index_table = ax.table(cellText=regression_results_index_table.values,
                 colLabels=regression_results_index_table.columns,
                 cellLoc='center', 
                 loc='center')

plt.title('Regression Results Index Table', fontsize=14)
plt.show()
40/40 ━━━━━━━━━━━━━━━━━━━━ 0s 287us/step - loss: 5058.7817 - mae: 55.3928
Test Mean Absolute Error: 47.9600
40/40 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step
40/40 ━━━━━━━━━━━━━━━━━━━━ 0s 306us/step
      RMSE      MAE  #Samples
0  63.7866  47.9600      1255

Ref:

[1] Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hen Paul; Meester, Ludolf Erwin (2005). A Modern Introduction to Probability and Statistics. Springer Texts in Statistics. London: Springer London. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.